Search CORE

73 research outputs found

Conditional Gaussian Mixtures

Author: Stephenson Todd Andrew
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

I show how conditional Gaussians, whose means are conditioned by a random variable, can be estimated and their likelihoods computed. This is based upon how regular Gaussians have their own parameters and likelihood computed. After explaining how to estimate the parameters of Gaussians and conditional Gaussians, I explain how to calculate their likelihoods even if there are missing elements in the data or, in the case of the conditional Gaussian, even if the conditioning variable is missing

Infoscience - École polytechnique fédérale de Lausanne

An Introduction to Bayesian Network Theory and Usage

Author: Stephenson Todd Andrew
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

I present an introduction to some of the concepts within Bayesian networks to help a beginner become familiar with this field's theory. Bayesian networks are a combination of two different mathematical areas: graph theory and probability theory. So, I first give the basic definition of Bayesian networks. This is followed by an elaboration of the underlying graph theory that involves the arrangements of nodes and edges in a graph. Since Bayesian networks encode one's beliefs for a system of variables, I then proceed to discuss, in general, how to update these beliefs when one or more of the variables' values are no longer unknown (i.e., you have observed their values). Learning algorithms involve a combination of learning the probability distributions along with learning the network topology. I then conclude Part I by showing how Bayesian networks can be used in various domains, such as in the time-series problem of automatic speech recognition. In Part II I then give in more detail some of the algorithms needed for working with Bayesian networks

Infoscience - École polytechnique fédérale de Lausanne

Speech recognition with auxiliary information

Author: Stephenson Todd Andrew
Publication venue: Lausanne, EPFL
Publication date: 16/03/2005
Field of study

Automatic speech recognition (ASR) is a very challenging problem due to the wide variety of the data that it must be able to deal with. Being the standard tool for ASR, hidden Markov models (HMMs) have proven to work well for ASR when there are controls over the variety of the data. Being relatively new to ASR, dynamic Bayesian networks (DBNs) are more generic models with algorithms that are more flexible than those of HMMs. Various assumptions can be changed without modifying the underlying algorithm and code, unlike in HMMs; these assumptions relate to the variables to be modeled, the statistical dependencies between these variables, and the observations which are available for certain of the variables. The main objective of this thesis, therefore, is to examine some areas where DBNs can be used to change HMMs' assumptions so as to have models that are more robust to the variety of data that ASR must deal with. HMMs model the standard observed features by jointly modeling them with a hidden discrete state variable and by having certain restraints placed upon the states and features. Some of the areas where DBNs can generalize this modeling framework of HMMs involve the incorporation of even more "auxiliary" variables to help the modeling which HMMs typically can only do with the two variables under certain restraints. The DBN framework is more flexible in how this auxiliary variable is introduced in different ways. First, this auxiliary information aids the modeling due to its correlation with the standard features. As such, in the DBN framework, we can make it directly condition the distribution of the standard features. Second, some types of auxiliary information are not strongly correlated with the hidden state. So, in the DBN framework we may want to consider the auxiliary variable to be conditionally independent of the hidden state variable. Third, as auxiliary information tends to be strongly correlated with its previous values in time, I show DBNs using discretized auxiliary variables that model the evolution of the auxiliary information over time. Finally, as auxiliary information can be missing or noisy in using a trained system, the DBNs can do recognition using just its prior distribution, learned on auxiliary information observations during training. I investigate these different advantages of DBN-based ASR using auxiliary information involving articulator positions, estimated pitch, estimated rate-of-speech, and energy. I also show DBNs to be better at incorporating auxiliary information than hybrid HMM/ANN ASR, using artificial neural networks (ANNs). I show how auxiliary information is best introduced in a time-dependent manner. Finally, DBNs with auxiliary information are better able than standard HMM approaches to handling noisy speech; specifically, DBNs with hidden energy as auxiliary information -- that conditions the distribution of the standard features and which is conditionally independent of the state -- are more robust to noisy speech than HMMs are

Infoscience - École polytechnique fédérale de Lausanne

Automatic Speech Recognition using Dynamic Bayesian Networks with the Energy as an Auxiliary Variable

Author: Escofet Jaume
Stephenson Todd Andrew
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

In current automatic speech recognition (ASR) systems, the energy is not used as part of the feature vector in spite of being a fundamental feature in the speech signal. The noise inherent in its estimation degrades the system performance. In this report we present an alternative approach for introducing the energy into the system so that it can help to enhance recognition. We present the experimental results of an ASR system based on dynamic Bayesian networks (DBNs) using the energy as an auxiliary variable. DBNs belong to the same family of statistical models as hidden Markov models (HMMs). However, DBNs are a more general framework and they allow more flexibility in defining new probabilistic relations between variables. We tried different network topologies and we noticed the benefit of conditioning the feature vector on the energy. Furthermore, hiding the value of the energy in recognition also improved the recognition performance

Infoscience - École polytechnique fédérale de Lausanne

Automatic Speech Recognition using Dynamic Bayesian Networks with both Acoustic and Articulatory Variables

Author: Bengio Samy
Bourlard Hervé
Morris Andrew
Stephenson Todd Andrew
Publication venue: Beijing
Publication date: 10/03/2006
Field of study

Current technology for automatic speech recognition (ASR) uses hidden Markov models (HMMs) that recognize spoken speech using the acoustic signal. However, no use is made of the causes of the acoustic signal: the articulators. We present here a dynamic Bayesian network (DBN) model that utilizes an additional variable for representing the state of the articulators. A particular strength of the system is that, while it uses measured articulatory data during its training, it does not need to know these values during recognition. As Bayesian networks are not used often in the speech community, we give an introduction to them. After describing how they can be used in ASR, we present a system to do isolated word recognition using articulatory information. Recognition results are given, showing that a system with both acoustics and inferred articulatory positions performs better than a system with only acoustics

Infoscience - École polytechnique fédérale de Lausanne

Auxiliary Variables in Conditional Gaussian Mixtures for Automatic Speech Recognition

Author: Bourlard Hervé
Magimai.-Doss Mathew
Stephenson Todd Andrew
Publication venue: Denver, CO, USA
Publication date: 10/03/2006
Field of study

In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain situations or in modeling it independently of the hidden state in others. Since only single conditional Gaussians were used in that work, we extend that work here to using conditional Gaussian mixtures in the emission distributions to make this work more comparable to state-of-the-art automatic speech recognition. We also introduce a rate-of-speech (ROS) variable within the conditional Gaussian mixtures. We find that, under the current methods, using observed pitch or ROS in the recognition phase does not provide improvement. However, systems trained on pitch or ROS may provide improvement in the recognition phase over the baseline when the pitch or ROS is marginalized out

Infoscience - École polytechnique fédérale de Lausanne

Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition

Author: Bourlard Hervé
Magimai.-Doss Mathew
Stephenson Todd Andrew
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Standard hidden Markov models (HMMs), as used in automatic speech recognition (ASR), calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution conditioned on the hidden state variable, considering the emissions independent of any other variable in the model. Recent work showed the benefit of conditioning the emission distributions on a discrete auxiliary variable, which is observed in training and hidden in recognition. Related work has shown the utility of conditioning the emission distributions on a continuous auxiliary variable. We apply mixed Bayesian networks (BNs) to extend these works by introducing a continuous auxiliary variable that is observed in training but is hidden in recognition. We find that an auxiliary pitch variable conditioned itself upon the hidden state can degrade performance unless the auxiliary variable is also hidden. The performance, furthermore, can be improved by making the auxiliary pitch variable independent of the hidden state

Infoscience - École polytechnique fédérale de Lausanne

Modelling auxiliary information (pitch frequency) in hybrid HMM/ANN based ASR systems

Author: Bourlard Hervé
Magimai.-Doss Mathew
Stephenson Todd Andrew
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with auxiliary information could improve the performance of the system. The previously proposed systems have been studied in the framework of GMMs. In this paper, we study and compare different ways to include auxiliary information in state-of-the-art hybrid HMM/ANN system. In the present paper, we have focused on pitch frequency as the auxiliary information. We have evaluated the proposed system on two different ASR tasks, namely, isolated word recognition and connected word recognition. Our results complement the previous efforts to incorporate auxiliary information in ASR system and also show that pitch frequency can indeed be used in ASR systems to improve the recognition performance

Infoscience - École polytechnique fédérale de Lausanne

Modeling Auxiliary Information in Bayesian Network Based ASR

Author: Bourlard Hervé
Magimai.-Doss Mathew
Stephenson Todd Andrew
Publication venue: Aalborg, Denmark
Publication date: 10/03/2006
Field of study

Automatic speech recognition bases its models on the acoustic features derived from the speech signal. Some have investigated replacing or supplementing these features with information that can not be precisely measured (articulator positions, pitch, gender, etc.) automatically. Consequently, automatic estimations of the desired information would be generated. This data can degrade performance due to its imprecisions. In this paper, we describe a system that treats pitch as an auxiliary information within the framework of Bayesian networks, resulting in improved performance

Infoscience - École polytechnique fédérale de Lausanne

Dynamic Bayesian Network Based Speech Recognition with Pitch and Energy as Auxiliary Variables

Author: Bourlard Hervé
Escofet Jaume
Magimai.-Doss Mathew
Stephenson Todd Andrew
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they usually result in a significant degradation on recognition performance due to the noise inherent in estimating or modeling them. In this paper, we show experimentally how this can be corrected by either conditioning the emission distributions upon these features or by marginalizing out these features in recognition. Since this is not obvious to do with standard hidden Markov models (HMMs), this work has been performed in the framework of dynamic Bayesian networks (DBNs), resulting in more flexibility in defining the topology of the emission distributions and in specifying whether variables should be marginalized out

Infoscience - École polytechnique fédérale de Lausanne